Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: distributed sampler for multi GPU #3053

Merged
merged 57 commits into from
Dec 9, 2024
Merged

Conversation

ori-kron-wis
Copy link
Collaborator

Update for the dataloader in case of distributed sampler

close #3047

@ori-kron-wis ori-kron-wis added the cuda tests Run test suite on CUDA label Nov 25, 2024
@ori-kron-wis ori-kron-wis added this to the scvi-tools 1.3 milestone Nov 25, 2024
@ori-kron-wis ori-kron-wis self-assigned this Nov 25, 2024
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.61%. Comparing base (3b78d2d) to head (2f780bb).
Report is 1 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (3b78d2d) and HEAD (2f780bb). Click for more details.

HEAD has 7 uploads less than BASE
Flag BASE (3b78d2d) HEAD (2f780bb)
10 3
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3053      +/-   ##
==========================================
- Coverage   90.06%   84.61%   -5.45%     
==========================================
  Files         178      178              
  Lines       15125    15126       +1     
==========================================
- Hits        13622    12799     -823     
- Misses       1503     2327     +824     
Files with missing lines Coverage Δ
src/scvi/dataloaders/_concat_dataloader.py 100.00% <100.00%> (ø)

... and 22 files with indirect coverage changes

@ori-kron-wis ori-kron-wis marked this pull request as ready for review November 28, 2024 11:24
@ori-kron-wis ori-kron-wis changed the title Feat: Fix distributed sampler for multi GPU Fix: distributed sampler for multi GPU Dec 5, 2024
@@ -45,6 +53,12 @@ def __init__(
self._shuffle = shuffle
self._batch_size = batch_size
self._drop_last = drop_last
self._distributed_sampler = distributed_sampler
# self._drop_dataset_tail = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? Drop_last should have the same effect?

@@ -124,6 +124,7 @@ def __init__(
drop_last=drop_last,
drop_dataset_tail=drop_dataset_tail,
shuffle=shuffle,
# **kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kwargs are missing here?

Copy link
Member

@canergen canergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. See minor comments.

@ori-kron-wis ori-kron-wis added multiGPU tests Run test suite on Multi GPU CUDA workstation and removed cuda tests Run test suite on CUDA labels Dec 9, 2024
@ori-kron-wis ori-kron-wis added on-merge: backport to 1.2.x on-merge: backport to 1.2.x on-merge: backport to 1.3.x on-merge: backport to 1.3.x and removed on-merge: backport to 1.2.x on-merge: backport to 1.2.x labels Dec 9, 2024
@ori-kron-wis
Copy link
Collaborator Author

multi gpu tests not working in the runner, but do locally, will continue with this in scvi tools v1.3 to work with other models

@ori-kron-wis ori-kron-wis merged commit 9a30f1f into main Dec 9, 2024
14 of 15 checks passed
Copy link

lumberbot-app bot commented Dec 9, 2024

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 1.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 9a30f1fe217e70fe5b9e372afe928b284c22fef3
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #3053: Fix: distributed sampler for multi GPU'
  1. Push to a named branch:
git push YOURFORK 1.3.x:auto-backport-of-pr-3053-on-1.3.x
  1. Create a PR against branch 1.3.x, I would have named this PR:

"Backport PR #3053 on branch 1.3.x (Fix: distributed sampler for multi GPU)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multiGPU tests Run test suite on Multi GPU CUDA workstation on-merge: backport to 1.3.x on-merge: backport to 1.3.x Still Needs Manual Backport
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-GPU fails when using ConcatDataLoader
2 participants